STAT 456: Introduction to Statistical Theory

Lecture 1: Review — MGFs, Transformations, Common Distributions

Soumojit Das

2026-01-13

Objectives

Welcome to STAT 456: Mathematical Statistics

Today’s objectives — Review key tools from 443 that underpin everything in this course:

  • Moment generating functions (MGFs)
  • Transformations of random variables
  • Key distributional relationships

Moment Generating Functions — Recall

Definition (MGF)

The moment generating function of a random variable \(X\) is \[M_X(t) = E[e^{tX}]\] provided this expectation exists for all \(t\) in some interval \((-h, h)\) for \(h > 0\).

  • Discrete: \(M_X(t) = \sum_x e^{tx} P(X=x)\)
  • Continuous: \(M_X(t) = \int_{-\infty}^{\infty} e^{tx} f_X(x) \, dx\)

Why a Neighborhood of 0?

The obvious question: Why do we require \(M_X(t)\) to exist for \(t \in (-h, h)\) and not just at \(t = 0\)?

The trivial answer: At \(t=0\), every random variable has \[M_X(0) = E[e^0] = E[1] = 1\] This tells us nothing about \(X\)!

The deeper reason: We need derivatives to recover moments: \[E[X^n] = M_X^{(n)}(0) = \lim_{t \to 0} \frac{d^n}{dt^n} M_X(t)\]

Taking derivatives at a single point requires the function to be analytic (smooth in a neighborhood).

Why a Neighborhood? (continued)

The Philosophical Point

The MGF encodes the entire distribution into a single function. But this requires the distribution’s tails to be “light enough” that \(E[e^{tX}]\) converges.

Example of failure: The Cauchy distribution

\[f(x) = \frac{1}{\pi(1 + x^2)}\]

  • Even \(E[|X|]\) doesn’t exist!
  • \(M_X(t)\) exists only at \(t = 0\), nowhere else
  • Heavy tails → moments explode → MGF undefined

Discussion: What do heavy-tailed distributions tell us about real-world phenomena? When should we expect MGFs to fail?

MGF Properties — Recall

Theorem (Generating Moments)

If \(X\) has mgf \(M_X(t)\), then \[E[X^n] = M_X^{(n)}(0) = \left. \frac{d^n}{dt^n} M_X(t) \right|_{t=0}\]

Theorem (Linear Transformation)

For constants \(a, b\): \[M_{aX+b}(t) = e^{bt} M_X(at)\]

Why “Moment Generating”?

The name isn’t arbitrary!

Expand \(M_X(t) = E[e^{tX}]\) using Taylor series: \[M_X(t) = E\left[\sum_{k=0}^\infty \frac{(tX)^k}{k!}\right] = \sum_{k=0}^\infty \frac{t^k}{k!} E[X^k]\]

So: \[M_X(t) = 1 + tE[X] + \frac{t^2}{2!}E[X^2] + \frac{t^3}{3!}E[X^3] + \cdots\]

The moments are the Taylor coefficients!

Take the \(n\)-th derivative and evaluate at \(t=0\): \[M_X^{(n)}(0) = E[X^n]\]

The Insight

The MGF is the generating function for the moment sequence \(\{E[X^n]\}_{n=0}^\infty\).

This connects to:

  • Combinatorics: Generating functions encode sequences
  • Complex analysis: Analytic functions determined by Taylor coefficients

MGF Uniqueness Theorem — Statement Only

Theorem (MGF Uniqueness)

Let \(F_X\) and \(F_Y\) be two cdfs whose mgfs exist.

If \(M_X(t) = M_Y(t)\) for all \(t\) in some neighborhood of 0, then \(F_X(u) = F_Y(u)\) for all \(u\).

Key Point: This is why MGFs are so useful for identifying distributions — if two random variables have the same MGF, they have the same distribution!

The Power of Uniqueness

Why is this theorem revolutionary?

Without uniqueness, to verify two random variables have the same distribution, we’d need to check: \[F_X(u) = F_Y(u) \quad \text{for } \textbf{every} \, u \in \mathbb{R}\]

That’s an infinite number of checks!

With uniqueness, we only need to verify: \[M_X(t) = M_Y(t) \quad \text{for } t \in (-h, h)\]

A single functional equation replaces infinitely many pointwise checks.

What Does Uniqueness Give Us?

Three Powerful Tools

  1. Algebraic verification instead of analytic: Match formulas, not CDFs
  2. Compositionality: \(M_{X+Y}(t) = M_X(t) M_Y(t)\) when \(X \perp Y\)
  3. Problem-solving tool: Compute \(M_Z(t)\), recognize the form, identify \(Z\)’s distribution

This course will repeatedly use this pattern:

  • Find the MGF of a complicated statistic
  • Simplify using algebra and independence
  • Match to a known distribution via uniqueness

Discussion: What other areas of mathematics leverage “finding the right representation” to simplify problems?

MGF Convergence Theorem — Statement Only

Theorem (Convergence of MGFs)

Suppose \(\{X_i\}\) is a sequence of random variables with mgfs \(M_{X_i}(t)\).

If \(\lim_{i \to \infty} M_{X_i}(t) = M_X(t)\) for all \(t\) in a neighborhood of 0, and \(M_X(t)\) is an mgf, then \[\lim_{i \to \infty} F_{X_i}(x) = F_X(x)\] at all continuity points of \(F_X\).

Application: Key for asymptotics — we’ll use this for MLE asymptotic normality later in the course.

Common MGFs — Reference Table

Distribution Parameters MGF \(M_X(t)\) Constraint
Normal \(\mu, \sigma^2\) \(\exp(\mu t + \sigma^2 t^2/2)\) all \(t\)
Gamma \(\alpha, \beta\) \((1 - \beta t)^{-\alpha}\) \(t < 1/\beta\)
Chi-squared \(p\) (df) \((1-2t)^{-p/2}\) \(t < 1/2\)
Exponential \(\beta\) \((1 - \beta t)^{-1}\) \(t < 1/\beta\)
Binomial \(n, p\) \((pe^t + 1-p)^n\) all \(t\)
Poisson \(\lambda\) \(\exp(\lambda(e^t - 1))\) all \(t\)

Transformations — Recall Jacobian Method

Univariate case:

If \(Y = g(X)\) with \(g\) strictly monotone and differentiable: \[f_Y(y) = f_X(g^{-1}(y)) \cdot \left| \frac{d}{dy} g^{-1}(y) \right|\]

Multivariate case:

If \(\mathbf{Y} = g(\mathbf{X})\) with \(g: \mathbb{R}^n \to \mathbb{R}^n\) a diffeomorphism: \[f_{\mathbf{Y}}(\mathbf{y}) = f_{\mathbf{X}}(g^{-1}(\mathbf{y})) \cdot |J|\]

where \(J\) is the Jacobian determinant of \(g^{-1}\).

Key Distributional Relationships

Important Relationships

  • If \(Z \sim N(0,1)\), then \(Z^2 \sim \chi^2_1\)
  • If \(U \sim \chi^2_p\) and \(V \sim \chi^2_q\) independent, then \(U + V \sim \chi^2_{p+q}\)
  • \(\chi^2_p = \text{Gamma}(p/2, 2)\)
  • If \(X_1, \ldots, X_n \sim \text{Gamma}(\alpha, \beta)\) independent, then \(\sum X_i \sim \text{Gamma}(n\alpha, \beta)\)

Example — Gamma MGF Derivation

Let \(X \sim \text{Gamma}(\alpha, \beta)\). Then: \[M_X(t) = \frac{1}{\Gamma(\alpha)\beta^\alpha} \int_0^\infty e^{tx} x^{\alpha-1} e^{-x/\beta} \, dx\]

Recognize integrand as kernel of Gamma\((\alpha, \beta/(1-\beta t))\): \[M_X(t) = \left( \frac{1}{1 - \beta t} \right)^\alpha, \quad t < \frac{1}{\beta}\]

Finding moments: \[E[X] = M'_X(0) = \alpha\beta\] \[E[X^2] = M''_X(0) = \alpha(\alpha+1)\beta^2\] \[\text{Var}(X) = \alpha\beta^2\]

Example — Poisson Approximation via MGF Convergence

Setup: \(X \sim \text{Binomial}(n, p)\) with \(\lambda = np\) fixed as \(n \to \infty\)

Binomial MGF: \(M_X(t) = (pe^t + 1-p)^n\)

Poisson MGF: \(M_Y(t) = e^{\lambda(e^t - 1)}\) where \(Y \sim \text{Poisson}(\lambda)\)

Convergence: With \(p = \lambda/n\): \[M_X(t) = \left(1 + \frac{\lambda(e^t - 1)}{n}\right)^n \to e^{\lambda(e^t - 1)} = M_Y(t)\]

By MGF Convergence Theorem: Binomial\((n, \lambda/n) \xrightarrow{d}\) Poisson\((\lambda)\)

Looking Ahead

Course roadmap:

MGFs → Sampling distributions (next)
         ↓
    Sampling distributions → Sufficiency
         ↓
    Sufficiency → Point estimation (MLE, UMVUE)
         ↓
    Estimation → Hypothesis testing

The unifying theme: Extracting information efficiently

  • MGFs: Transform distributions into functions we can manipulate
  • Sufficiency: Compress data without losing information
  • Point estimation: Extract parameters from samples optimally
  • Hypothesis testing: Extract decisions from evidence rigorously

The Philosophy

Each concept involves finding the right representation to make hard problems tractable. MGF uniqueness is just the first example of this powerful pattern.

Next lecture: Sampling from the normal distribution — \(\bar{X}\), \(S^2\), and their independence